Goto

Collaborating Authors

 Sud-Est Development Region




Formalized Hopfield Networks and Boltzmann Machines

Cipollina, Matteo, Karatarakis, Michail, Wiedijk, Freek

arXiv.org Artificial Intelligence

Neural networks are widely used, yet their analysis and verification remain challenging. In this work, we present a Lean 4 formalization of neural networks, covering both deterministic and stochastic models. We first formalize Hopfield networks, recurrent networks that store patterns as stable states. We prove convergence and the correctness of Hebbian learning, a training rule that updates network parameters to encode patterns, here limited to the case of pairwise-orthogonal patterns. We then consider stochastic networks, where updates are probabilistic and convergence is to a stationary distribution. As a canonical example, we formalize the dynamics of Boltzmann machines and prove their ergodicity, showing convergence to a unique stationary distribution using a new formalization of the Perron-Frobenius theorem.



Towards Formalizing Reinforcement Learning Theory

Zhang, Shangtong

arXiv.org Machine Learning

In this paper, we formalize the almost sure convergence of $Q$-learning and linear temporal difference (TD) learning with Markovian samples using the Lean 4 theorem prover based on the Mathlib library. $Q$-learning and linear TD are among the earliest and most influential reinforcement learning (RL) algorithms. The investigation of their convergence properties is not only a major research topic during the early development of the RL field but also receives increasing attention nowadays. This paper formally verifies their almost sure convergence in a unified framework based on the Robbins-Siegmund theorem. The framework developed in this work can be easily extended to convergence rates and other modes of convergence. This work thus makes an important step towards fully formalizing convergent RL results. The code is available at https://github.com/ShangtongZhang/rl-theory-in-lean.


FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models

Modoranu, Ionut-Vlad, Safaryan, Mher, Schultheis, Erik, Ryabinin, Max, Chumachenko, Artem, Alistarh, Dan

arXiv.org Artificial Intelligence

Low-rank optimization has emerged as a promising direction in training large language models (LLMs) to improve running time and reduce the memory usage of adaptive optimizers by constraining learning to a lower-dimensional space. Prior work typically projects gradients of linear layers using approaches based on Singular Value Decomposition (SVD) or QR-decomposition. Applying these techniques individually to each layer in large models is computationally expensive and incurs additional memory costs due to storing the projection matrices. In this work, we propose a computationally efficient and conceptually simple, two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces by using a predefined orthogonal matrix of the Discrete Cosine Transform (DCT). We dynamically select columns from the DCT matrix based on their alignment with the gradient of each layer. The effective projection matrices are obtained via a simple matmul with the DCT matrix in $O(n^3)$ time, followed by a lightweight sorting step to identify the most relevant basis vectors. For large layers, DCT can be computed via Makhoul's $N$-point algorithm based on Fast Fourier Transform (FFT) in $O(n^2 \log(n))$ time. Due to the predefined nature of the orthogonal bases, they are computed once at the start of training. Our numerical experiments on both pre-training and fine-tuning tasks demonstrate the effectiveness of our dual strategy in approximating optimal low-rank projections, obtaining an approach with rank-independent running time that matches the performance of costly SVD/QR-based methods while achieving faster runtime and reduced memory usage by up to $25\%$ across different model sizes. Our code is available at \href{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}{\texttt{https://github.com/IST-DASLab/ISTA-DASLab-Optimizers}}.



In-Context Learning for Pure Exploration

Russo, Alessio, Welch, Ryan, Pacchiano, Aldo

arXiv.org Artificial Intelligence

We study the problem active sequential hypothesis testing, also known as pure exploration: given a new task, the learner adaptively collects data from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically selected queries that indirectly reveal information about the label. In this work, we introduce In-Context Pure Exploration (ICPE), which meta-trains Transformers to map observation histories to query actions and a predicted hypothesis, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates. Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for general sequential testing.


Russia-Ukraine war: List of key events, day 1,314

Al Jazeera

Can Ukraine restore its pre-war borders? Why are Tomahawk missiles for Ukraine a'red line' for Russia? Is Russia testing NATO with aerial incursions in Europe? At least 4 killed in major Russian drone, missile attack on Ukraine's Kyiv Russia's President Vladimir Putin said his forces are prevailing in what he described as a "righteous battle" in Ukraine . "Our fighters and commanders go on the attack, and the entire country, all of Russia, is waging this righteous battle and working hard," he said.


Canonical Representations of Markovian Structural Causal Models: A Framework for Counterfactual Reasoning

de Lara, Lucas

arXiv.org Artificial Intelligence

Counterfactual reasoning aims at answering contrary-to-fact questions like "Would have Alice recovered had she taken aspirin?" and corresponds to the most fine-grained layer of causation. Critically, while many counterfactual statements cannot be falsified--even by randomized experiments--they underpin fundamental concepts like individual-wise fairness. Therefore, providing models to formalize and implement counterfactual beliefs remains a fundamental scientific problem. In the Markovian setting of Pearl's causal framework, we propose an alternative approach to structural causal models to represent counterfactuals compatible with a given causal graphical model. More precisely, we introduce counterfactual models, also called canonical representations of structural causal models. They enable analysts to choose a counterfactual assumption via random-process probability distributions with preassigned marginals and characterize the counterfactual equivalence class of structural causal models. Using these representations, we present a normalization procedure to disentangle the (arbitrary and unfalsifiable) counterfactual choice from the (typically testable) interventional constraints. In contrast to structural causal models, this allows to implement many counterfactual assumptions while preserving interventional knowledge, and does not require any estimation step at the individual-counterfactual layer: only to make a choice. Finally, we illustrate the specific role of counterfactuals in causality and the benefits of our approach on theoretical and numerical examples.